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conclusions, as the association between E and D could be 
discovered in all subgroups selected for the study. 
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We agree completely that representative studies are im- 
mensely valuable for describing disease patterns, quantify- 
ing the burden of disease' and generating risk stratification 
models.^ Given representativeness is time- and place- 
specific,^ these all need regular updates and more represen- 
tative studies. For example, the SCORE (Systematic 
COronary Risk Evaluation) system for predicting fatal 
cardiovascular disease (CVD) uses the same risk factors 
in different models for high- and low-CVD-risk European 
countries,"* but over time countries may, also, be promoted 
from high to low risk.^ Clearly, such risk prediction mod- 
els are not scientific models that describe nature consist- 
ently across space and time,* but they are immensely useful 
for service planning, targeting treatment and saving lives. 
Conversely, experimental studies, such as animal models 
and randomized controlled trials (RCTs), do not require 
representativeness to test scientific models.*' 

On the other hand, whether observational epidemiolo- 
gical studies, representative or not, are useful for generat- 
ing hypotheses or testing causal factors in scientific models 
is less clear. First, these represent the triumph of hope over 
experience.^ Second, as was pointed out over 20 years ago, 
nearly all possible hypotheses have already been gener- 
ated.^ Third, some potentially relevant hypotheses may not 
be readily observed for conceptual or practical reasons. 
The current paradigm may exclude some hypotheses as im- 
possible, making them imperceptible. Apart from well- 
known biases inherent in observational studies, causal fac- 
tors may be invariant in commonly studied populations, 
expensive or difficult to measure, affected by preclinical 
disease or hidden within the (mis)classification of diseases 
by symptom rather than cause. Fourth, as a discipline we 



have not generally thought through the hierarchy of studies 
to refute a hypothesis. Our current methods, using the 
Bradford-Hill viewpoints as a touchstone, are much more 
focused on corroborating hypotheses, with an RCT as the 
pinnacle of corroboration. However, even something as 
simple as 'field' epidemiology may refute hypotheses. For 
example, the existence of populations with low birth- 
weight and low rates of heart disease casts doubt on a 
major role of birthweight in heart disease.' 

Given these issues if we want to make progress in identify- 
ing causal processes in population health, assuming it is pos- 
sible,'" rather than focusing on representativeness in studies 
used to generate or test (corroborate) hypotheses, it might be 
more useful to look for better ways to generate and screen 
plausible hypotheses, before we test them in suitable stud- 
ies.'' Other methods of generating hypotheses about the driv- 
ers of population health are not obvious, but include using 
general mechanistic principles, starting with effective treat- 
ments and taking advantage of mechanistic insights from 
genetics or RCTs which include potential mediators. Not 
only do we need to move on from the debate about represen- 
tativeness, we need to move onto some different questions. 
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We were recently invited, along with several other authors, 
to comment on the paper by Rothman et al} on represen- 
tativeness.^ The journal editors also commented on 
Rothman's paper^ and took the opportunity to comment 
on our paper^ as well as two recent papers of ours pub- 
lished in another journal."*'' We don't wish to revive or un- 
necessarily prolong this debate (the various authors are 
largely in agreement in any case), but we would like to 
reply in order to correct some misrepresentations of our 
work. We can identify many, and it is not possible to re- 
spond to all of them in a brief letter. 

Ebrahim and Davey Smith claim we 'suggest that non- 
representative populations produce only weak bias in 
exposure-disease associations'. In fact, we argued that 'each 
population, including a selected study population, has its 
own confounding pattern', and that 'there is no reason to be- 
lieve that control of confounding can be more easily achieved 
in a population-based cohort than in a restricted cohort'. 
Either situation can be associated with bias, and there is no a 
priori reason to believe that one is always or usually more 
biased than the other (e.g. in a study of smoking and lung 
cancer, a restricted population such as British doctors may be 
less confounded than a general population sample). 

In our two recent papers we explored the effects of 
selection through a 'simulation study'"* and an 'empirical 



study' of an internet-based birth cohort.' In the simulation 
study, we considered a simplified scenario including an 
exposure E, an outcome D and a determinant R of both the 
selection S and the outcome. We simulated scenarios in 
which selection introduces bias and concluded that the bias 
is very small (a true relative risk for the exposure-outcome 
association of 1.00 becomes 1.02) in situations in which all 
relative risks involved are 2.0 or 0.5, and modest (a true rela- 
tive risk of 1.00 becomes 1.16) when all relative risks are 4.0 
or 0.25. We argued that 'it is unlikely that multiple and inde- 
pendent important disease risk factors would affect the sam- 
ple selection' (this sentence quoted by Ebrahim and Davey 
Smith) and that 'it is indeed reasonable to consider R as a 
vector resulting from the combination of a set of correlated 
risk factors, all moderately associated with S' (this part not 
quoted by Ebrahim and Davey Smith who instead suggested 
that we missed the point that multiple risk factors may play a 
role). It is important to emphasize the term 'independent' in 
the above quote: risk factors tend to cluster together, so selec- 
tion processes which are biased with respect to one risk fac- 
tor may be biased with respect to others in the same cluster; 
but, for precisely this reason, the total bias from the cluster 
of associated risk factors is usually not much greater than the 
bias from one factor alone — in either case, our estimates 
apply for the range of relative risks that we considered. 
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